165 research outputs found

    Odelman typpilannoituksen ja niittoajan vaikutus timoteinurmen satoon

    Get PDF
    vokKirjasto Aj-

    A Profile-Based Method for Authorship Verification

    Get PDF
    Abstract. Authorship verification is one of the most challenging tasks in stylebased text categorization. Given a set of documents, all by the same author, and another document of unknown authorship the question is whether or not the latter is also by that author. Recently, in the framework of the PAN-2013 evaluation lab, a competition in authorship verification was organized and the vast majority of submitted approaches, including the best performing models, followed the instance-based paradigm where each text sample by one author is treated separately. In this paper, we show that the profile-based paradigm (where all samples by one author are treated cumulatively) can be very effective surpassing the performance of PAN-2013 winners without using any information from external sources. The proposed approach is fully-trainable and we demonstrate an appropriate tuning of parameter settings for PAN-2013 corpora achieving accurate answers especially when the cost of false negatives is high.

    Overview of PAN'17: Author Identification, Author Profiling, and Author Obfuscation

    Full text link
    [EN] The PAN 2017 shared tasks on digital text forensics were held in conjunction with the annual CLEF conference. This paper gives a high-level overview of each of the three shared tasks organized this year, namely author identification, author profiling, and author obfuscation. For each task, we give a brief summary of the evaluation data, performance measures, and results obtained. Altogether, 29 participants submitted a total of 33 pieces of software for evaluation, whereas 4 participants submitted to more than one task. All submitted software has been deployed to the TIRA evaluation platform, where it remains hosted for reproducibility purposes.The work at the Universitat PolitĂšcnica de ValĂšncia was funded by the MINECO research project SomEMBED (TIN2015-71147-C2-1-P).Potthast, M.; Rangel-Pardo, FM.; Tschuggnall, M.; Stamatatos, E.; Rosso, P.; Stein, B. (2017). Overview of PAN'17: Author Identification, Author Profiling, and Author Obfuscation. Lecture Notes in Computer Science. 10456:275-290. https://doi.org/10.1007/978-3-319-65813-1_25S27529010456AmigĂł, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009)Bagnall, D.: Authorship clustering using multi-headed recurrent neural networks—notebook for PAN at CLEF 2016. In: Balog et al. [3] (2016). http://ceur-ws.org/Vol-1609/Balog, K., Cappellato, L., Ferro, N., Macdonald, C. (eds.): CLEF 2016 Evaluation Labs and Workshop – Working Notes Papers, 5–8 September, Évora, Portugal. CEUR Workshop Proceedings. CEUR-WS.org (2016). http://www.clef-initiative.eu/publication/working-notesClarke, C.L., Craswell, N., Soboroff, I., Voorhees, E.M.: Overview of the TREC 2009 web track. Technical report, DTIC Document (2009)GarcĂ­a, Y., Castro, D., Lavielle, V., Noz, R.M.: Discovering author groups using a ÎČ\beta ÎČ -compact graph-based clustering. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) CLEF 2017 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017GlavaĆĄ, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: Association for Computational Linguistics (2016)Gollub, T., Stein, B., Burrows, S.: Ousting ivory tower research: towards a web framework for providing experiments as a service. In: Hersh, B., Callan, J., Maarek, Y., Sanderson, M. (eds.) 35th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2012), pp. 1125–1126. ACM, August 2012GĂłmez-Adorno, H., Aleman, Y., no, D.V., Sanchez-Perez, M.A., Pinto, D., Sidorov, G.: Author clustering using hierarchical clustering analysis. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) CLEF 2017 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Hagen, M., Potthast, M., Stein, B.: Overview of the author obfuscation task at PAN 2017: safety evaluation revisited. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Halvani, O., Graner, L.: Author clustering based on compression-based dissimilarity scores. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) CLEF 2017 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In: Advances in Neural Information Processing Systems (NIPS), pp. 3294–3302 (2015)Kocher, M., Savoy, J.: UniNE at CLEF 2017: author clustering. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) CLEF 2017 Working Notes. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Koppel, M., Akiva, N., Dershowitz, I., Dershowitz, N.: Unsupervised decomposition of a document into authorial components. In: Lin, D., Matsumoto, Y., Mihalcea, R. (eds.) Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1356–1364 (2011)Misra, H., Yvon, F., Jose, J.M., Cappe, O.: Text segmentation via topic modeling: an analytical study. In: Proceedings of CIKM 2009, pp. 1553–1556. ACM (2009)Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguis. 28(1), 19–36 (2002)Potthast, M., Eiselt, A., BarrĂłn-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN), Amsterdam, The Netherlands, September 2011Potthast, M., Gollub, T., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks: plagiarism detection, author identification, and author profiling. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 268–299. Springer, Cham (2014). doi: 10.1007/978-3-319-11382-1_22Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2016. http://ceur-ws.org/Vol-1609/Potthast, M., Hagen, M., Völske, M., Stein, B.: Crowdsourcing interaction logs to understand text reuse from the web. In: Fung, P., Poesio, M. (eds.) Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 13), pp. 1212–1221. Association for Computational Linguistics (2013). http://www.aclweb.org/anthology/p13-1119Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop – Working Notes Papers, 8–11 September, Toulouse, France. CEUR Workshop Proceedings, CEUR-WS.org, September 2015Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at PAN 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) CLEF 2014 Evaluation Labs and Workshop – Working Notes Papers, 15–18 September, Sheffield, UK. CEUR Workshop Proceedings, CEUR-WS.org, September 2014Rangel, F., Rosso, P., Franco-Salvador, M.: A low dimensionality representation for language variety identification. In: 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing. LNCS. Springer (2016). arXiv:1705.10754Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Forner, P., Navigli, R., Tufis, D. (eds.) CLEF 2013 Evaluation Labs and Workshop – Working Notes Papers, 23–26 September, Valencia, Spain (2013)Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In: Balog et al. [3]Riedl, M., Biemann, C.: TopicTiling: a text segmentation algorithm based on LDA. In: Proceedings of ACL 2012 Student Research Workshop, pp. 37–42. Association for Computational Linguistics (2012)Scaiano, M., Inkpen, D.: Getting more from segmentation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 362–366. Association for Computational Linguistics (2012)Stamatatos, E., Tschuggnall, M., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., Potthast, M.: Clustering by authorship within and across documents. In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org. http://ceur-ws.org/Vol-1609/Stamatatos, E., Tschuggnall, M., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., Potthast, M.: Clustering by authorship within and across documents. In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2016Tschuggnall, M., Stamatatos, E., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., Potthast, M.: Overview of the author identification task at PAN-2017: style breach detection and author clustering. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 201

    ”Tartten aikuisten apua tehtĂ€viin, mutta niitĂ€ on kyllĂ€ mukava tehdĂ€.”:esikoululaisten nĂ€kemyksiĂ€ vahvuuksistaan mielenkiinnonkohteistaan ja tuen tarpeistaan: keskustelu symbolein -pelin avulla

    Get PDF
    TiivistelmÀ. Esiopetuksessa lapselle laaditaan yhteistyössÀ lapsen, huoltajien ja opettajien kanssa pedagoginen suunnitelma, joka ohjaa lapsen koko esiopetusvuoden toimintaa. TÀmÀn yksilöllisen ja tavoitteellisen suunnittelun lÀhtökohtana ovat lapsen vahvuudet, mielenkiinnonkohteet ja tuen tarpeet. Pedagogisissa asiakirjoissa lapsen oma nÀkökulma tulee vaihtelevasti esille nÀkyvÀksi. TÀmÀn tutkimuksen tarkoituksena on selvittÀÀ, miten lapsi kertoo omista vahvuuksistaan, mielenkiinnonkohteistaan ja tuen tarpeistaan Keskustelu symbolein -pelin avulla ja kuinka Keskustelu symbolein -peli tukee lapsen ilmaisua. Tutkimuksessa kuvataan myös, miten lapsen vahvuudet, mielenkiinnonkohteet ja tuen tarpeet on tuotu esille lapsen pedagogisissa asiakirjoissa. Laadullisen tutkimuksen aineisto koostui viiden lapsen Keskustelu symbolein -pelin taulujen valokuvista ja lapsen pedagogisista asiakirjoista. Tutkimuksen aineisto analysoitiin neljÀssÀ vaiheessa. Analysoinnissa yhdistettiin sisÀllönanalyysin ja Barthesin semioottisen kuvatulkintamallin menetelmiÀ. TÀmÀn tutkimuksen tuloksissa saatiin selville, ettÀ Keskustelu symbolein -pelissÀ lapsen aktiivinen rooli ja omat nÀkemykset tulivat vahvasti esille. LisÀksi saatiin selville, ettÀ Keskustelu symbolein -peli auttoi lapsia tunnistamaan ja tarkemmin pohtimaan omia vahvuuksiaan, mielenkiinnonkohteitaan ja tuen tarpeitaan. Tutkimus osoitti myös, ettÀ Keskustelu symbolein -peli tuki lapsen ilmaisua ja oli tÀten toimiva menetelmÀ kehityskeskusteluissa kÀytettÀvÀksi. TÀmÀn tutkimuksen johtopÀÀtöksenÀ voidaan todeta, ettÀ Keskustelu symbolein-peli tuo lasten omat nÀkökulmat taidoistaan esille laajemmin ja monitahoisemmin verrattuna siihen, mitÀ lasten pedagogisiin asiakirjoihin oli kirjattu. Keskustelu symbolein-peliÀ pelatessaan lapsella on aktiivisempi rooli ja valta siitÀ mitÀ kertoo. Lapsi sijoittaessaan kuvia eri pohjille pystyy kertomaan omia nÀkemyksiÀÀn, vaikka hÀnen kielelliset taitonsa eivÀt siihen riittÀisi. LisÀksi lapsi pÀÀsee osallistumaan ja vaikuttamaan oman pedagogisen asiakirjan tekemiseen, kun hÀnen Keskustelu symbolein -pelissÀ ilmaisemat nÀkemyksensÀ kirjataan siihen

    Overview of PAN 2018. Author identification, author profiling, and author obfuscation

    Full text link
    [EN] PAN 2018 explores several authorship analysis tasks enabling a systematic comparison of competitive approaches and advancing research in digital text forensics.More specifically, this edition of PAN introduces a shared task in cross-domain authorship attribution, where texts of known and unknown authorship belong to distinct domains, and another task in style change detection that distinguishes between single author and multi-author texts. In addition, a shared task in multimodal author profiling examines, for the first time, a combination of information from both texts and images posted by social media users to estimate their gender. Finally, the author obfuscation task studies how a text by a certain author can be paraphrased so that existing author identification tools are confused and cannot recognize the similarity with other texts of the same author. New corpora have been built to support these shared tasks. A relatively large number of software submissions (41 in total) was received and evaluated. Best paradigms are highlighted while baselines indicate the pros and cons of submitted approaches.The work at the Universitat Polit`ecnica de Val`encia was funded by the MINECO research project SomEMBED (TIN2015-71147-C2-1-P)Stamatatos, E.; Rangel-Pardo, FM.; Tschuggnall, M.; Stein, B.; Kestemont, M.; Rosso, P.; Potthast, M. (2018). Overview of PAN 2018. Author identification, author profiling, and author obfuscation. Lecture Notes in Computer Science. 11018:267-285. https://doi.org/10.1007/978-3-319-98932-7_25S26728511018Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 2011 Labs and Workshops, 19–22 September 2011, Amsterdam, Netherlands, September 2011. http://www.clef-initiative.eu/publication/working-notesBird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)Bogdanova, D., Lazaridou, A.: Cross-language authorship attribution. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pp. 2015–2020 (2014)Choi, F.Y.: Advances in domain independent linear text segmentation. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference (NAACL), pp. 26–33. Association for Computational Linguistics, Seattle, April 2000CustĂłdio, J.E., Paraboni, I.: EACH-USP ensemble cross-domain authorship attribution. In: Working Notes Papers of the CLEF 2018 Evaluation Labs, September 2018, to be announcedDaneshvar, S.: Gender identification in Twitter using n-grams and LSA. In: Working Notes Papers of the CLEF 2018 Evaluation Labs, September 2018, to be announcedDaniel Karaƛ, M.S., Sobecki, P.: OPI-JSA at CLEF 2017: author clustering and style breach detection. In: Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings. CLEF and CEUR-WS.org, September 2017Giannella, C.: An improved algorithm for unsupervised decomposition of a multi-author document. The MITRE Corporation. Technical Papers, February 2014Glover, A., Hirst, G.: Detecting stylistic inconsistencies in collaborative writing. In: Sharples, M., van der Geest, T. (eds.) The New Writing Environment, pp. 147–168. Springer, London (1996). https://doi.org/10.1007/978-1-4471-1482-6_12Hagen, M., Potthast, M., Stein, B.: Overview of the author obfuscation task at PAN 2017: safety evaluation revisited. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Hagen, M., Potthast, M., Stein, B.: Overview of the author obfuscation task at PAN 2018. In: Working Notes Papers of the CLEF 2018 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (2018)Hellekson, K., Busse, K. (eds.): The Fan Fiction Studies Reader. University of Iowa Press, Iowa City (2014)Juola, P.: An overview of the traditional authorship attribution subtask. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September 2012, Rome, Italy, September 2012. http://www.clef-initiative.eu/publication/working-notesJuola, P.: The rowling case: a proposed standard analytic protocol for authorship questions. Digital Sch. Humanit. 30(suppl–1), i100–i113 (2015)Kestemont, M., Luyckx, K., Daelemans, W., Crombez, T.: Cross-genre authorship verification using unmasking. Engl. Stud. 93(3), 340–356 (2012)Kestemont, M., et al.: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In: Working Notes Papers of the CLEF 2018 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (2018)Koppel, M., Schler, J., Bonchek-Dokow, E.: Measuring differentiability: unmasking pseudonymous authors. J. Mach. Learn. Res. 8, 1261–1276 (2007)Overdorf, R., Greenstadt, R.: Blogs, Twitter feeds, and reddit comments: cross-domain authorship attribution. Proc. Priv. Enhanc. Technol. 2016(3), 155–171 (2016)Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)Potthast, M., Eiselt, A., BarrĂłn-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN), Amsterdam, The Netherlands, September 2011Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2016. http://ceur-ws.org/Vol-1609/Potthast, M., Hagen, M., Völske, M., Stein, B.: Crowdsourcing interaction logs to understand text reuse from the web. In: Fung, P., Poesio, M. (eds.) Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 1212–1221. Association for Computational Linguistics, August 2013. http://www.aclweb.org/anthology/P13-1119Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, Toulouse, France, pp. 8–11. CEUR-WS.org, September 2015Rangel, F., et al.: Overview of the 2nd author profiling task at PAN 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) CLEF 2014 Evaluation Labs and Workshop - Working Notes Papers, Sheffield, UK, pp. 15–18. CEUR-WS.org, September 2014Rangel, F., Rosso, P., G’omez, M.M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at pan 2018: multimodal gender identification in Twitter. In: CLEF 2018 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org (2017)Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Forner, P., Navigli, R., Tufis, D. (eds.) CLEF 2013 Evaluation Labs and Workshop - Working Notes Papers, 23–26 September 2013, Valencia, Spain, September 2013Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In: Balog, K., Cappellato, L., Ferro, N., Macdonald, C. (eds.) CLEF 2016 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, September 2016Safin, K., Kuznetsova, R.: Style breach detection with neural sentence embeddings. In: Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Sapkota, U., Bethard, S., Montes, M., Solorio, T.: Not all character n-grams are created equal: a study in authorship attribution. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 93–102 (2015)Sapkota, U., Solorio, T., Montes, M., Bethard, S., Rosso, P.: Cross-topic authorship attribution: will out-of-topic data help? In: Proceedings of the 25th International Conference on Computational Linguistics. Technical Papers, pp. 1228–1237 (2014)Stamatatos, E.: Intrinsic plagiarism detection using character nnn-gram Profiles. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 38–46. Universidad PolitĂ©cnica de Valencia and CEUR-WS.org, September 2009. http://ceur-ws.org/Vol-502Stamatatos, E.: On the robustness of authorship attribution based on character n-gram features. J. Law Policy 21, 421–439 (2013)Stamatatos, E.: Authorship attribution using text distortion. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Long Papers, vol. 1, pp. 1138–1149. Association for Computational Linguistics (2017)Stamatatos, E., et al.: Overview of the author identification task at PAN 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, 8–11 September 2015, Toulouse, France. CEUR-WS.org, September 2015Stamatatos, E., et al.: Clustering by authorship within and across documents. In: Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2016. http://ceur-ws.org/Vol-1609/Takahashi, T., Tahara, T., Nagatani, K., Miura, Y., Taniguchi, T., Ohkuma, T.: Text and image synergy with feature cross technique for gender identification. In: Working Notes Papers of the CLEF 2018 Evaluation Labs, September 2018, to be announcedTellez, E.S., Miranda-JimĂ©nez, S., Moctezuma, D., Graff, M., Salgado, V., Ortiz-Bejar, J.: Gender identification through multi-modal tweet analysis using microtc and bag of visual words. In: Working Notes Papers of the CLEF 2018 Evaluation Labs, September 2018, to be announcedTschuggnall, M., Specht, G.: Automatic decomposition of multi-author documents using grammar analysis. In: Proceedings of the 26th GI-Workshop on Grundlagen von Datenbanken. CEUR-WS, Bozen, October 2014Tschuggnall, M., et al.: Overview of the author identification task at PAN-2017: style breach detection and author clustering. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, vol. 1866. CLEF and CEUR-WS.org, September 2017. http://ceur-ws.org/Vol-1866

    Overview of the author identification task at PAN 2014

    Get PDF
    The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches

    Plagiarism and authorship analysis: introduction to the special issue

    Full text link

    Morpho-syntactic processing of Arabic plurals after aphasia: dissecting lexical meaning from morpho-syntax within word boundaries

    Get PDF
    Within the domain of inflectional morpho-syntax, differential processing of regular and irregular forms has been found in healthy speakers and in aphasia. One view assumes that irregular forms are retrieved as full entities, while regular forms are compiled on-line. An alternative view holds that a single mechanism oversees regular and irregular forms. Arabic offers an opportunity to study this phenomenon, as Arabic nouns contain a consonantal root, delivering lexical meaning, and a vocalic pattern, delivering syntactic information, such as gender and number. The aim of this study is to investigate morpho-syntactic processing of regular (sound) and irregular (broken) Arabic plurals in patients with morpho-syntactic impairment. Three participants with acquired agrammatic aphasia produced plural forms in a picture-naming task. We measured overall response accuracy, then analysed lexical errors and morpho-syntactic errors, separately. Error analysis revealed different patterns of morpho-syntactic errors depending on the type of pluralization (sound vs broken). Omissions formed the vast majority of errors in sound plurals, while substitution was the only error mechanism that occurred in broken plurals. The dissociation was statistically significant for retrieval of morpho-syntactic information (vocalic pattern) but not for lexical meaning (consonantal root), suggesting that the participants' selective impairment was an effect of the morpho-syntax of plurals. These results suggest that irregular plurals forms are stored, while regular forms are derived. The current findings support the findings from other languages and provide a new analysis technique for data from languages with non-concatenative morpho-syntax

    Feature extraction and selection for Arabic tweets authorship authentication

    Get PDF
    © 2017, Springer-Verlag Berlin Heidelberg. In tweet authentication, we are concerned with correctly attributing a tweet to its true author based on its textual content. The more general problem of authenticating long documents has been studied before and the most common approach relies on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Inspired by the success of modern automatic document classification problem, some researchers followed the Bag-Of-Words (BOW) approach for authenticating long documents. In this work, we consider both approaches and their application on authenticating tweets, which represent additional challenges due to the limitation in their sizes. We focus on the Arabic language due to its importance and the scarcity of works related on it. We create different sets of features from both approaches and compare the performance of different classifiers using them. We experiment with various feature selection techniques in order to extract the most discriminating features. To the best of our knowledge, this is the first study of its kind to combine these different sets of features for authorship analysis of Arabic tweets. The results show that combining all the feature sets we compute yields the best results

    Profiling idioms: a sociolexical approach to the study of phraseological patterns

    Get PDF
    Conference paper presented at the international conference 'Computational and Corpus-based Phraseology' (Europhras 2019), 25-27 September 2019, Malaga, Spain.This paper introduces a novel approach to the study of lexical and pragmatic meaning called ‘sociolexical profiling’, which aims at correlating the use of lexical items with author-attributed demographic features, such as gender, age, profession, and education. The approach was applied to a case study of a set of English idioms derived from the Pattern Dictionary of English Verbs (PDEV), a corpus-driven lexical resource which defines verb senses in terms of the phraseological patterns in which a verb typically occurs. For each selected idiom, a gender profile was generated based on data extracted from the Blog Authorship Corpus (BAC) in order to establish whether any statistically significant differences can be detected in the way men and women use idioms in every-day communication. A quantitative and qualitative analysis of the gender profiles was subsequently performed, enabling us to test the validity of the proposed approach. If performed on a large scale, we believe that sociolexical profiling will have important implications for several areas of research, including corpus lexicography, translation, creative writing, forensic linguistics, and natural language processing
    • 

    corecore